-
Notifications
You must be signed in to change notification settings - Fork 251
add et export with gguf with test #245
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving with mixed emotions.
model_to_pte = model | ||
model_to_dso = model | ||
else: | ||
if output_pte_path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very kludgy and I would prefer to export to int4 and then handle it from there. Basing front end decisions on backend is a very bad practice because we're going to end up in a world of hurt.
Kimish and I had discussed doing a transform from int4 ->a8w4dq. Right now we just get a de-quantized model.
Please plan to land that asap, Kimish?
cc: @kimishpatel
if executorch_export_available: | ||
print(f"Exporting model using Executorch to {output_pte_path}") | ||
export_model_et(model, builder_args.device, args.output_pte_path, args) | ||
export_model_et(model_to_pte, builder_args.device, args.output_pte_path, args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this at all :( But we're out of runway, so I will approve for now.
output_dso_path = str(os.path.abspath(output_dso_path)) | ||
print(f"Exporting model using AOT Inductor to {output_dso_path}") | ||
export_model_aoti(model, builder_args.device, output_dso_path, args) | ||
export_model_aoti(model_to_dso, builder_args.device, output_dso_path, args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
f8884e6
to
bc92599
Compare
bc92599
to
9563191
Compare
* add et export with gguf with test * fix generate too * add gguf path to generate
* add et export with gguf with test * fix generate too * add gguf path to generate
* add et export with gguf with test * fix generate too * add gguf path to generate
* add et export with gguf with test * fix generate too * add gguf path to generate
* add et export with gguf with test * fix generate too * add gguf path to generate
* add et export with gguf with test * fix generate too * add gguf path to generate
ET does not support _weight_int4pack_mm, so this adds gguf_kwargs that can be passed to building that control whether GGUF should be load_as_quantized. If load_as_quantized=False, GGUF is converted to floating point.
Also adds test for torchchat export + generate to et.yml with gguf file.